Probabilistic Verification

Andrew Singleton

Summary Scores

  • Hexbins
  • Bias
  • Spread :: Skill (RMSE)
  • Spread :: Skill (STDE)
  • Rank Histogram
  • Continuous Rank Probability Score (CRPS)

Hexbins

Directly compare forecasts & observations

  • Like a “scatter” plot
  • Hexagons are equally distributed in all directions

Distribution of biases

Too cool when it’s warm

Too warm when it’s cooler

Bias

Bias of the ensemble mean

Bias of individual members

Improved bias?

Improved bias?

Spread :: Skill (RMSE)

What is skill (RMSE)?

\(\sqrt{\frac{1}{N}\sum\limits_{1}^{N} \left(\overline{f} - o\right)^{2}}\)

What is spread?

\(\sqrt{\frac{1}{N}\sum\limits_{1}^{N} \left(Var\left(f\right)\right)}\)

What is skill (STDE)?

\(\sqrt{\frac{1}{NM-1}\sum\limits_{i=1}^{NM} \left(f_{i} - o\right)^{2}}\)

What is spread?

\(\sqrt{\frac{1}{N}\sum\limits_{1}^{N} \left(Var\left(f\right)\right)}\)

Impact of bias

RMSE

Impact of bias

STDE

Rank histograms

Rank observations by ensemble member

  • Shows dispersion of ensemble
  • Aiming for normalized frequency of 1 for each rank

Continuous Rank Probability Score (CRPS)

Mean difference between ensemble CDF and observations

  • Area between forecast CDF and observed step function
  • Averaged over each case
  • Equivalent to mean absolute error for deterministic

Bias is a component of CRPS

Impact of bias

  • 1\(^\circ\) positive bias, 1\(^\circ\) negative bias

Threshold Scores

Brier Score

Mean Square Error in probability space for a given threshold

  • \(BS=\frac{1}{N}\sum\limits_{i=1}^{N}\left(f_i-o_i\right)^2\)
  • Range [0, \(\infty\)]
  • Perfect score = 0
  • Sensitive to event rarity

Brier Skill score

  • Compares Brier Score with a reference (usually sample climatology)
  • \(BSS=1-\frac{BS}{BS_{ref}}\)
  • Range [\(-\infty\), 1]
  • \(<=0\) -> No skill
  • 1 -> Perfect skill
  • Reduces to \(-\infty\) when there are no events

Reliability

Compares forecast probability with observed frequency

  • With what frequency did the event occur for a given forecast probability?
  • Observed frequency should equal forecast probability
  • Easy to interpret and communicate
  • Difficult to reduce to a single number

Relative / Receiver Operating Characteristic

Compares Hit Rate with False Alarm Rate

  • Given that the event occurred, how did the forecast do for each probability?
  • \(HR=\frac{H}{H+M}\)
  • \(FAR=\frac{FA}{FA + CR}\)
  • Top left is best
  • Can be reduced to area under the ROC curve

Area under ROC curve summarises

Brier Score Decomposition

Reliability, Resolution and Uncertainty

  • \(BS_{rel}-BS_{res}+BS_{unc}\)
  • Perfect \(BS_{rel} = 0\)
  • Perfect \(BS_{res}=BS_{unc}\)
  • \(BS_{unc}\) is a function of the observations only

Economic value

User oriented score

  • For what cost/loss ratios is the ensemble better than climatology
  • Usually sample climatology